Playing Tetris Using Bandit-Based Monte-Carlo Planning

نویسندگان

  • Zhongjie Cai
  • Bernhard Nebel
چکیده

Tetris is a stochastic, open-ended board game. Existing artificial Tetris players often use different evaluation functions and plan for only one or two pieces in advance. In this paper, we developed an artificial player for Tetris using the bandit-based Monte-Carlo planning method (UCT). In Tetris, game states are often revisited. However, UCT does not keep the information of the game states explored in the previous planning episodes. We created a method to store such information for our player in a specially designed database to guide its future planning process. The planner for Tetris has a high branching factor. To improve game performance, we created a method to prune the planning tree and lower the branching factor. The experiment results show that our player can successfully play Tetris, and the performance of our player is improved as the number of the played games increases. The player can defeat a benchmark player with high probabilities.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bandit Based Monte-Carlo Planning

For large state-space Markovian Decision Problems MonteCarlo planning is one of the few viable approaches to find near-optimal solutions. In this paper we introduce a new algorithm, UCT, that applies bandit ideas to guide Monte-Carlo planning. In finite-horizon or discounted MDPs the algorithm is shown to be consistent and finite sample bounds are derived on the estimation error due to sampling...

متن کامل

The Parallelization of Monte-Carlo Planning - Parallelization of MC-Planning

Since their impressive successes in various areas of large-scale parallelization, recent techniques like UCT and other Monte-Carlo planning variants (Kocsis and Szepesvari, 2006a) have been extensively studied (Coquelin and Munos, 2007; Wang and Gelly, 2007). We here propose and compare various forms of parallelization of bandit-based tree-search, in particular for our computer-go algorithm XYZ.

متن کامل

Research Summary

Monte-Carlo Tree Search (MCTS) (Coulom 2007; Kocsis and Szepesvári 2006) is an online planning algorithm that combines the ideas of best-first tree search and Monte-Carlo evaluation. Since MCTS is based on sampling, it does not require a transition function in explicit form, but only a generative model of the domain. Because it grows a highly selective search tree guided by its samples, it can ...

متن کامل

On MABs and Separation of Concerns in Monte-Carlo Planning for MDPs

Linking online planning for MDPs with their special case of stochastic multi-armed bandit problems, we analyze three state-of-the-art Monte-Carlo tree search algorithms: UCT, BRUE, and MaxUCT. Using the outcome, we (i) introduce two new MCTS algorithms, MaxBRUE, which combines uniform sampling with Bellman backups, and MpaUCT, which combines UCB1 with a novel backup procedure, (ii) analyze them...

متن کامل

Bandit Algorithms in Game Tree Search: Application to Computer Renju∗

Multi-armed bandit problem is to maximize a cumulated reward by playing arms sequentially without prior knowledge. Algorithms for this problem such as UCT have been successfully extended to computer GO programs and proved significantly effective by defeating professional players. The goal of the project is to implement a Renju AI based on Monte Carlo planning that is able to defeat the oldest k...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011